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ABSTRACT 

Multidisciplinary design optimization (MDO) gives rise to nonlinear optimization problems 
characterized by a large number of constraints that naturally occur in blocks. We propose a 
class of multilevel optimization methods motivated by the structure and number of constraints 
and by the expense of the derivative computations for MDO. The algorithms are an extension to 
the nonlinear programming problem of the successful class of local Brown-Brent algorithms for 
nonlinear equations. Our extensions allow the user to partition constraints into arbitrary blocks to 
fit the application, and they separately process each block and the objective function, restricted to 
certain subspaces. The methods use trust regions as a globalization startegy, and they have been 
shown to be globally convergent under reasonable assumptions. The multilevel algorithms can be 
applied to all classes of MDO formulations. Multilevel algorithms for solving nonlinear systems 
of equations are a special case of the multilevel optimization methods. In this case, they can be 
viewed as a trust-region globalization of the Brown-Brent class. 
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1 Introduction 


This work is concerned with a class of methods, called multilevel optimization algorithms, for 
solving the nonlinear equality constrained optimization problem, i.e., 

Problem EQC: 


minimize f(x ) 
subject to C(x ) = 0, 

where / : ft" $ and C : -> m < n, are at least twice continuously differentiable. 

The proposed class of algorithms can be used to solve any general nonlinear equality constrained 
optimization problem, but its development has been motivated by the engineering design problems 
that give rise to large-scale optimization formulations with constraints occuring naturally in blocks. 
In particular, in the multidisciplinary design optimization (MDO) environment, the sheer number 
of constraints, the structure of the problems, and the expense of the derivative computations 
necessitate the development of flexible algorithms that allow the user to partition the problem into 
a set of smaller problems. 

While there is a number of nonlinear optimization methods that attack large problems by 
decomposing them into several smaller ones, these methods require the problems to have a special 
structure, for example, separability and convexity. 

In particular, in engineering, decomposition and multilevel optimization have been used to 
solve large problems for some time. See [15] and [29] for a survey. The process of decomposition 
and multilevel formulation generally depends on identifying groups of variables and constraints 
that influence each other only weakly. The problem is then decomposed into such weakly cou- 
pled subproblems in various possible formulations, some hierarchic, some nonhierarchic. Recent 
developments in formulations can be found in [3] and [9]. Some of the approaches in [3] have been 
proven to be successful for many problems. In order to be more widely applicable, it requires the 
development of theoretical foundations. 

We propose a class of multilevel optimization methods (see [1]), for solving the nonlinear equality 
constrained optimization problem characterized by the following features: 

• The constraints of the problem can be partitioned into blocks in any manner suitable to an 
application, or in any arbitrary manner at all. The analysis of the methods assumes certain 
standard smoothness and boundedness properties, but no other assumptions are made on the 
structure of the problem. There is no need to identify the weakly coupled groups of variables 
and constraints, although that may be helpful in practice. If all constraints and variables 
are strongly coupled, the partitioning can be done according to any other criterion useful to 
a particular application, for example, just the size of constraint blocks. The algorithm then 
solves progressively smaller dimensional subproblems to arrive at the trial step. 

• The multilevel methods belong to the class of out-of-core methods. To the authors’ knowledge, 
the multilevel algorithms are the only algorithms for general nonlinear optimization problems 
that require only a currently processed part of the constraints to be held in memory. Thus, 
theoretically, there is no limit to the size of the problem the methods can handle. 
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• The trial steps computed by the algorithm are required to satisfy very mild conditions, both 
theoretically and computationally. In fact, the substeps comprising the trial step can be 
computed in the subproblems using different optimization algorithms. The substeps are 
only required to satisfy a mild decrease condition for the subproblems and a reasonable 
boundedness condition — both satisfied in practice by most methods of interest. This feature 
is of great practical significance because in applications like MDO various constraint blocks 
may originate from different disciplines and may require different approaches to solving the 
subproblems. 

• The class uses trust regions as a globalization strategy. The algorithms are proven to converge 
under reasonable assumptions. 

• The algorithms together with their convergence theory provide a foundation for developing 
the algorithms and analyses of the general multilevel optimization formulations. 

The proposed multilevel class of algorithms differs from the conventional algorithms in that its 
major iteration involves computing an approximate solution of not one model over a single restricted 
region, but of a sweep of models, each approximately minimized over its own restricted region. Each 
model approximates a block of constraints and, finally, the objective function, restricted to certain 
subspaces. Each model is computed at a different point. The case of a single block of constraints 
is included. 

In the next section we introduce the foundations on which the proposed class of algorithms rests. 
Section 3 is devoted to the description of the class. Section 4 briefly describes current theoretical 
results. Section 5 concludes with a summary and discussion of current and future research. 

2 Preliminaries 

The proposed class of algorithms may be viewed as an extension of several areas of research. In 
this section we describe the existing algorithms and analysis schemes which serve as a foundation 
for the multilevel optimization methods. 

2.1 The Local Brown-Brent Class of Methods 

Theoretical origins of this research lie in the method for solving nonlinear systems of equations, 
F(x) = 0,F : introduced by Brown in [5], [6], [7] .In [4], Brent viewed Brown’s method 

from a different perspective, which allowed Brent to propose a class of methods, among which 
Brown’s original method was a special case. Gay [14] and Martinez [23], [24] provided further 
modifications and generalizations of the methods. 

The following statement of the general Brown-Brent algorithm was condensed from the de- 
scriptions in Gay [14] and Dennis [17]. In these works the algorithm is described in terms of 
one- dimensional blocks. 

Denote the components of F(x) by F^(x ), . . ., F n (x). 

Algorithm 2.1 Local Brown-Brent Algorithm for Nonlinear Systems 

Let x c be the current approximation to the solution. 
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Outer Loop: Do until convergence: 

yo = % c 

H 0 = $ n 

Inner Loop: Do k = 1 ,n 

1. Form the linearization, L k about y^-i of F k restricted to n?=o 
L k = 0 defines H k , an (» - fc)-dimensional hyperplane in 9£ n . 

2. Move from y k -\ € f)i=o Hi to Vk ^ fl?= o Hi- 

End Inner Loop 

— 2/n 

End Outer Loop 

The point y n of intersection of all the hyperplanes is the point where all the linearizations vanish. 
The way in which the steps 1-2 of the inner loop are actually done determines the particular kind 
of Brown-Brent method. In Brent’s method, s k = yk ~ yk - 1 is the shortest I 2 norm step from 
y k _ 1 to H k . In Brown’s method, s k is the shortest £2 norm step from y k -\ to H k parallel the k - th 
coordinate axis. 

When applied to a linear system of equations, i.e., when F(x) = Ax - 6, Brown’s method is 
equivalent to Gaussian elimination with pivoting about the maximum row element of the reduced 
matrix [5], while Brent’s method is equivalent to factoring A into a product of a lower triangulai 
matrix and an orthogonal matrix [4]. It can be shown, based on [31], that there exists a Brown-Brent 
analog for any matrix decomposition in the linear case. 

Brown [5], [7], Brown and Dennis [8], Brent [4], and Gay [14] established local quadratic con- 
vergence of variants of the algorithm, both for analytic and finite difference derivatives. To the 
authors’ knowledge, there had been no theoretically supported global extensions of Brown-Brent 
methods until [1]. 

2.2 Trust-Region Methods 

Consider the following unconstrained minimization problem. 

Problem UNC: 


minimize f(x) 
x € U n , 


where f : $ is continuously differential. Given x c , the current approximation to the solution, 

a trust-region algorithm for solving the problem finds a trial step by solving the following trust- 
region subproblem approximately: 


minimize f(x c ) + V/(x c ) T s + -s T H c s 
subject to ||s|| < 6 e , 


( 1 ) 


where /, S c 6 V/, s £ H c = Hj £ 5T Xn is the Hessian of / or an approximation to it, S c > 0 

is the trust-region radius, and || • || denotes the I 2 norm. The idea is to accept the trial step when 
the quadratic model adequately predicts the behavior of the function, and to recompute the step 
in a smaller region if it does not. 
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The trust-region approach to the problem of solving systems of nonlinear equations is just a 
special case of the approach to the problem above; namely, for nonlinear equations, the objective 
function f(x) is taken to be ||F(a:)|| 2 . 

Detailed treatment of the trust-region approach to unconstrained optimization and nonlinear 
equations can be found in Dennis and Schnabel [18], Sorensen [30], More [25], More and Sorensen 
[26], Powell [21], and Shultz, Schnabel and Byrd[28]. 

For the equality constrained optimization problem, the successive quadratic programming (SQP) 
algorithm is used commonly. Its step is found by computing a minimum of the quadratic model 
of the Lagrangian at the current point, subject to linearized constraints. A trust-region algo- 
rithm based on SQP adds the trust-region constraint to the subproblem and additional constraints 
designed to ensure that the trust-region constraint and the linearized constraints are consistent. 

2.2.1 Merit Functions 

In order to evaluate a trial step, trust-region algorithms use merit functions, which are functions 
related to the problem in such a way that the improvement in the merit function signifies progress 
toward the solution of the problem. 

For unconstrained minimization, a natural choice for a merit function is the objective function 
itself. Let 

<t>( s ) = /Oc) + V/(z c ) T s + ^s t H c s (2) 

denote the quadratic model of the merit function. We define two related functions. 

The actual reduction is defined as 

o-Ted c (s c ) — f(x c ^ f(x c -j- s c ), (3) 

and the predicted reduction is defined as 

pred c (s c ) = 0(0) - <j>(s c ) (4) 

= -V/(z c ) T (s c ) - ~sjH c s c , 

so that the predicted reduction in the merit function is an approximation to the actual reduction 
in the merit function. 

The standard way to evaluate the trial step in trust-region methods is to consider the ratio of 
the actual reduction to the predicted reduction. A value lower than a small predetermined value 
causes the step to be rejected. Otherwise the step is accepted. 

For nonlinear systems of equations, the norm of the residuals serves as a merit function. For the 
constrained optimization, the merit function is some expression that involves both the objective 
function and the constraints. 

We shall see that conventional merit functions prove to be inadequate for multilevel algorithms. 

2.2.2 Fraction of Cauchy Decrease 

To assure global convergence of a trust-region algorithm for problem UNC, the trial step is required 
to satisfy a fraction of Cauchy decrease condition. This mild condition means that the trial 
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step, s c , must predict at least a fraction of the decrease predicted by the Cauchy step, which is the 
steepest descent step for the model within the trust region. We must have for some fixed k > 0 

(f)(s c ) - <£(o) < - <K°)L ( 5 ) 


where 

8c P = -a^ p Vf(x c ) with 




l|v/(Qli 2 


Vf{x c ) T HcVf(Xc) 
&c 


IIV/MII 


if IIV/MI1! < 6 

11 Vf(x c )TH c Vf(x c ) ~ c 

otherwise. 


See Dennis and Schnabel [18], pp. 139 — 141, for details on the Cauchy point. 

The fraction of Cauchy decrease property implies a weaker condition which has a more conve- 
nient form and is frequently used as a technical lemma in the global convergence proofs. 


Lemma 2.1 Let s c satisfy (5). Then 

(f>(0) - <t>(s c ) > |||V/(x c )||min{l^^p,^}. ( 6 ) 

References: Powell [21]; More [25]. 

Either (5) or (6) is necessary to establish global convergence theoretically. 


2.2.3 Global Convergence Results 

Powell’s global convergence theorem [21] for any unconstrained minimization trust-region algorithm 
serves as a prototype for most trust-region related convergence results. 

Theorem 2.1 Let f : W l -*■ U be continuously differentiable and bounded below on the level set 
{ x ^ ?fc n \f(x) < f(x o)}. Assume that {Hi} are uniformly bounded above. Let {a;;} be the sequence 
of iterates generated by a trust-region algorithm that satisfies (5) or (6). Then 

Uminf||V/(^)|| = 0. 

i—* oo 

Detailed treatment of the unconstrained minimization theory and practice can be found in More 
[25], More and Sorensen [26], Sorensen [30], and Shultz, Schnabel and Byrd [28]. 


2.2.4 Tangent-Space Methods for Constrained Optimization 

The multilevel methods proposed here may be viewed as a generalization of an approach to nonlinear 
programming known as the null space or generalized elimination approach (see lletcher [13]). 

Different authors refer to different methods as “null space methods”, but the general idea of a 
null space method for equality constrained minimization is to reduce the dimension of the problem 
by first taking the step intended to solve the constraint equations, and then to minimize the model 
of the function restricted to the null space of the linearized constraints. The resulting minimization 
problem is of a lower dimension than the original one. 

A well-known local method of this type is the GRG (Generalized Reduced Gradient) algorithm. 
Details of GRG and other null space methods can be found in Lasdon [20], Fletcher [13], Avriel [2], 
and Gill, Murray and Wright [27]. 
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A class of global trust-region algorithms that use the same general principle of reducing the 
problem’s dimension is known as the class of tangent space methods. The tangent space approach 
was introduced to avoid the possibility of inconsistency of the constrained trust-region subproblem. 

Recent work on these methods can be found in Maciel [22] and Dennis, El-Alem and Maciel 
[19]. The main feature of the class is that the trial step is computed as a sum of two substeps, the 
first of which is made toward the linearized constraints in the direction orthogonal to the null space 
of the constraint Jacobian, while the second is made to minimize the model of the Lagrangian in 
the null space of the linearized constraints. The function and derivative information is computed 
at a single point x c . 

The multilevel methods proposed here generalize the tangent space methods in the sense that 
their trial steps are sums of not two substeps but of as many substeps as there are constraint 
blocks together with a substep on the model of the objective function with the model information 
computed at the points resulting from taking the substeps one-by-one. 

3 Multilevel Algorithms for Nonlinear Optimization 

In this section we present the class of multilevel optimization algorithms for the nonlinear equality 
constrained minimization problem. Since the time of its introduction in [1], the class has undergone 
changes. In [1], the globalization and extension to constrained optimization only of local Brent’s 
method was proposed. Recent developments* have extended the results to provide globalization 
and extension to constrained optimization of the entire local Brown-Brent class. 

3.1 Notation 

Due to arbitrary blocking of the constraints, the notation becomes cumbersome. To ease the 
reading effort, we omit the subscripts and superscripts where possible. Here is an explanation of 
the notation conventions. 

Unless specified otherwise, all norms are t>i norms. 

^From here on, we assume that the equality constraints of problem EQC are partitioned into 
M blocks of arbitrary size and composition. Let the constraints of the first block be numbered 
from 7ii = 1 to ri 2 — 1? the constraints of the second block — from ri 2 to 713 — 1; and so on, until the 
constraints of the last block are numbered from um- 1 to um — m- 

The algorithms will be formally considered to have an outer loop, in which we make the decision 
about the acceptability of the step, and the inner loop, in which we solve a sequence of minimization 
subproblems. The sum of the substeps produced as solutions of these subproblems yields the total 
trial step. The outer loop counter is i ; the inner loop counter is k. Thus k corresponds to the 
block number of constraints. If the subscript k is used with a constant, that constant refers to 
the properties of the k - th block of constraints, independent of the iterates. Note that the term 
“inner loop” is formal. The purpose of the inner loop is to compute a basis for the null space of the 
Jacobian of our constraint system, but step-by-step, using information at different points, instead 
of the simultaneous computation of, say, the Newton’s method. 

* Natalia Alexandrov and J. E. Dennis, Jr. A class of general trust-region multilevel algorithms for systems of 
nonlinear equations: Global convergence theory. In preparation. 
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We denote the sequence of points generated by the outer loop of the algorithm by {xi} when 
we consider the iterates as members of the sequence in convergence analysis. Otherwise, we use x c , 
X-, and x+ to denote the current, the previous, and the next iterates, respectively. 

We denote the sequence of points generated within each inner loop by y k , k = 0, . . . , M + 1, 
when we need not consider the outer loop iteration number. Most of the time we shall be discussing 
entities within a single iteration. Otherwise, we use subscripts and superscripts. For example, y\ 
or yf, denote the inner k — th iterate within the i — th outer loop. Note that yo = x c and yM + 1 — #+• 

The substep produced by solving the A;— th subproblem of the inner loop is denoted by s k , k = 
1, . . . , M + 1. The sum Si + . . . + s M+1 yields the total trial step s c . Again, we use subscripts, e.g., 
Si, to denote the total step as a part of the sequence of steps produced by the algorithm. 

We denote the radius of the trust region for subproblem k, centered at yic-i, by 6 k , k = 
1, . . . , M + 1. The radius of the total trust region, centered at x c = yo is 6 C or 

We donote the projector onto the intersection of null spaces of VCi(x), . . . , VCfc(z) by P k . 

Again, when we omit superscripts, we refer to the objects within a single outer loop. For 
example, C k (y k - 1 ) refers to Ck(y x k -i) °r C k (y c k -i)- 

Additional notation will be introduced as needed. 

3.2 General Description 

The general glass of multilevel algorithms can be described in the following way. The constraint 
system of the problem is partitioned into M arbitrary blocks. In practice, this block decomposition 
is obvious in most cases. At the current approximation to a solution of problem EQC, x c , we set 
yo = x c . The trial step is computed as follows. 

We find an approximate minimizer, $i, of the quadratic Gauss-Newton model about yo of the 
first block of constraints in the trust region of radius . The step is required to satisfy a fraction 
of Cauchy decrease condition for this model and a mild boundedness condition disussed in the next 
subsection. The step is taken to yield the point y\ — yo 4- s\. 

We then find an approximate minimizer of the quadratic model of the second block of con- 
straints, restricted to the null space of the Jacobian of the first block. This model is built using the 
information at the new point. It is important to emphasize that all the function and derivative 
information for the second block is computed at the new point y\. The next step, S 2 , bounded by 
its own trust-region, is obtained to satisfy a fraction of Cauchy decrease condition for this restricted 
model of the second block. The step is taken to yield the point y 2 . 

The process of computing steps that satisfy sufficient predicted decrease for the restricted models 
of progressively smaller dimensions continues. Again, the model for each block is built by 
using the function and derivative information at the most recently computed point. 
The final step on the constraints, %, is obtained to produce sufficient predicted decrease in the 
quadratic model, at yM-u of the last block of constraints, restricted to the intersection of the null 
spaces of the Jacobians of all previous blocks. 

When all the constraint blocks have been processed, n — m degrees of freedom still remain. 
The remaining variables are used in building a model of the objective function, so that the final 
substep, sm+ i, is obtained to produce sufficient predicteddecrease in the quadratic model at yM 
of the objective function, restricted to the intersection of the null spaces of the Jacobians of all 
constraint blocks. The final step is taken to yield the next major iterate, i.e., the next approximation 
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to a solution of problem EQO. Thus, the total trial step from x c to x+ is the sum of the substeps 
in the inner sweep, i.e., s c = si + . . . + %+i- 

Unless the convergence criterion is met, the total trial step is evaluated, and the algorithm 
returns to process again the first block of constraints in a trust region determined by the success 
or failure of the trial step. 

3.2.1 Computing the Substeps 

During the constraint elimination stage, the substeps solve the following subproblems: 

minimize \\\Ck(yk-i) + ^Ck(yk-\) T A\l 
subject to VCj{yj-\) T s = 0 ,j= 1, . . . , k — 1, 
and possibly an additional constraint 
on the step direction, 
and \\s\\ 2 < 6 k , 

for k = l,...,Af. (Note that for k = 1 there is no null space constraint.) Then the objective 
function subproblem is: 

minimize /(vm) + V f(yM) T s + \s T H(y M )* 
subject to VCj{yj-\) T s — 0,^* = 1 , ...,M, 
and possibly an additional constraint 
on the step direction, 
and |M| 2 < 6m+ i- 

If there is no additional constraint on the direction of the step, the subproblems produce a 
trust-region generalizaton of the local Brent step. A constraint requiring that the step be parallel 
to some coordinate hyperplane would be a generalization of the local Brown step. In practice, there 
is no explicit constraint for the generalization of the Brown step; rather it is computed implicitly. 

Let Qk-i be a matrix the columns of which form a basis for the intersection of the null spaces of 
VCiW,...,VCjt_ife_ 2 ). A change of variables, v = Qk- is, converts the constrained subprob- 
lems to unconstrained ones. 

For relatively small problems, the null space bases can be computed by using the QR decom- 
position to find the basis for null space of VCi(jfo), and then by updating the decomposition for 
subsequent subproblems to find a basis for the null space intersections. For larger problems, the 
QR decomposition becomes prohibitively expensive. In that case, reduced basis projectors can be 
used. More details about the null space basis computations can be found, for example, in [27]. 

There are various methods for solving large-scale trust region subproblems. We are holding 
much hope for the method recently developed by D. Sorensen of Rice University. 

However, once the subproblems with null space constraints are converted into unconstrained 
trust-region subproblems, the steps may be chosen in any manner, as long as they satisfy two mild 
conditions. 

1. As mentioned earlier, if there are no additional constraints on the subproblem k , its solution, 
a Levenberg-Marquard step for the reduced problem, produces a generalization of the Brent 
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step. That is, the substep is orthogonal to the linearized constraint hyperplane, for all blocks 
numbered k + 1,...,M. However, we do not require that the substeps be orthogonal. We 
require that each substep satisfies 

IM < Afc||Cfc(»-i)ll (7) 


for some positive constant that depends only on the properties of that particular constraint 
block but is independent of the iteration. Other conditions are possible to assure global 
convergence. However, this condition, first formalized in [19], is reasonable in that it is 
enforced automatically by any algorithms of interest for computing linearly feasible points. 
For instance, this is easily shown for the extensions of both Brown and Brent steps. 


2. We also require for each substep to satisfy a fraction of Cauchy decrease condition for the 
particular subproblem that substep solves. This is also a very mild condition it is satisfied by 
all reasonable methods. Note that we do not place any conditions on the total trial step only 
on the substeps. 

It is easy to show that if sf ~ B is an unconstrained Brown or Brent substep (or any substep 
out of the local Brown-Brent class), we can claim the following: 

U\\ s k~ B \\ - then l et s k = s k ~ B * Otherwise let 


Sk 


6k * sf B 


( 8 ) 


Then $ k satisfies the fraction of Cauchy decrease condition on subproblem k. The proof is 
given in Alexandrov and DennisC 

Thus, we see that simply truncating the unconstrained Brown or Brent substep to the size 
of the trust region will produce sufficient predicted decrease in the models of the constraint 
blocks. 


3.2.2 The Merit Function and Its Model 

Merit functions used to evaluate the progress of single-block trust-region algorithms consist of some 
combination of the objective function and the constraints. One common merit function is the l<i 
penalty function fix) 4- / ) ||C f ( 3: )|| 2 > where p is the penalty parameter. 

In the process of the multilevel algorithm development, it has become apparent that conven- 
tional merit functions are inadequate for measuring progress of the multilevel methods, because a 
conventional merit function does not take into account the order in which minimization proceeds. 

The difficulty can be summarized as follows: 

• The result of the k - th minimization subproblem predicts decrease for the k - th component 
from point y k - 1 to point y k . It predicts no change for all previous blocks. However, there 
is no prediction at all about how 4- . . . + Sk changes and likely increases the norms of the 
blocks numbered k + 1, . . ., M. Neither does any substep, except sm + i predict the behavior 
of the objective function. 

t Natalia Alexandrov and J. E. Dennis, Jr! A class of general trust-region multilevel algorithms for systems of 
nonlinear equations: Global convergence theory. In preparation. 
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This observation brought us to the conclusion that the merit function must take into account 
the multilevel structure of the scheme. Consider the following modified l<i penalty function: 

P{x\ p M ) = f{ x ) + Pm(\\Cm{ x )\\ l. 2 

Pm-\{\\Cm-\{x)W 2 + pm-2(\\Cm-2(%)\\ 2 + • • • + p 2 (||C 2 (z)|| 2 4- pi||Ci(z)|| 2 )))) 

M M 

=/w+E(nPi)ii^wii a . 

k = 1 j—k 

where p k > 1, k = 1 where p k > 1, k = 1 The initial choice p k = 1 is arbitrary 

and scale-dependent. The only requirement is that pk > 1. For theoretical purposes, the problem 
is assumed to be well-scaled. 

The new merit function penalizes for the possible predicted increase in the constraint blocks 

k, . . . , M, or in the objective function that may have occured during inner loop iterations 1 , . . . , k— 1 . 

At 2 /m+i - x + = x c + s c , we model each ||C , A ; (a: H _)|| 2 by \\Ck{yk-i) + VCk{yk-i)$k\\ 2 , and so we 
model the merit function at x+ by 

Ad c (si,. . . -,Pm) = /{vm) + V/(^m) T <sm+i + -SM +1 H(y M )sM + 1 

+ \\Cm(VM-i) + VCa/(?/M-i) T 5m|| 2 + P C M-i(\\Cm-i(]JM~2) + VC , M-l(2/M-2) T 5M-l|| 2 
+PM-2(\\CM-2{yM-3) + VC M -2(^-3) T ^M-2|| 2 + • • • 

+p c 2 (\\C 2 (yi) + vc 2 ( yi ) T 52 || 2 + p\ nc^jto) + vc 1 (h>) t * 1 || 2 ))) 

= Kvm) + V/(?/m) T 5m+ i + -SM + iH(y M )sM+i 
M M 

+eoi Pj)\\Ck{yk-\) + VC k {yk-i) T s k \\ 2 - 

k = 1 j=k 

We define the actual reduction as the difference between the merit function values at x c and 
£-|_, and we define the predicted reduction as the difference between the value of the merit function 
at x c and the value of the model at x+. 

3.2.3 Updating the Penalty Parameters 

This penalty parameter updating scheme for multilevel methods generalizes the scheme proposed 
in El-Alem [10], [11]. It ensures that our merit function has an essential property, namely, that 
unless an iterate is optimal, the predicted reduction should always be positive. We use the following 
procedure: 

Algorithm 3.1 Penalty Parameter Updating Algorithm (Done on completion of each inner 
sweep of minimization problems.) 

Denote the set {si, . . .,.§&} by S k and denote the set {pi, . . . ,pfc} by Pk- 

At the beginning of a multilevel algorithm, set pf = ... = p~j^ = 1 and choose fi € (0, 1). 

l. Compute Cpredi(si) = ||C'i(y 0 )|| 2 - \\Ci(yo) + VCi(y 0 ) T si || 2 . 

2. Do k = 1, M 
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Update p k . 

Compute 

Cpred k+ i(S k +i;p c k ^i,Pk) = 

[||C fc+1 (^ 0 )|| 2 - \\C k+1 (y k ) + VC k+1 (y k ) T s k+1 1| 2 
+PkCpred k {Sk]p c k - 1 )- 
if Cpredk+i (S k + 1 ; p%_ 1 ,p k )> 

^-Cpred k (S k \ p£-i) then 

fi = Pk • 

Cpre4+i(>5Wi;PLi>4) = 

Cpred k+l (S k+l ; , p* ). 

else 

Pk = Pk + 0, 

where A W r y .IP- I P h. W II 2 1 , 

rK Opred k (S k ;p c k _ 1 ) 

Compute Cpred k+1 (S k+ i; Pl^, p%). 

end if 
end Do 

3. Update pM- 
Compute 

pred(S M ;p C M-2’PM-\) = 

[f(yo) ~ 4>m{s M )] + pM C Pred M {S M \ Pm-i)- 
if pred(S M ] P C M - 2 , Pm~i) > 

^Cpred M (SM;pM-i) then 

= Pw* 

pred(S M \P C M-2,PM- i) = P r ^d[SM’i Pm— 2> Pm— i)* 

else 


P°M ~ PM + jd? 


where ^ 


Compute pred(S M ', Pm- 2 > Pm-i)* 

end if 
End 


Note that without updating the penalty parameters we can be assured of the positive predicted 
reduction from x c only for the first block of constraints, i.e., only Cpred\(s\) is definitely positive 
without additional considerations. To ensure that Cpred 2 (s\, 525 pi) is positive, we may have to 
increase p\. Now that Cprec^si, $25 Pi) 1S positive, we use it to ensure that the next partial 
predicted reduction is positive, and so on. So, for each each substep s k , the predicted reduction 
accumulated by the step si + . . . + s k is at least a fraction of the predicted decrease accumulated 
by the step s\ + . . . + s k -\ . 

Thus the predicted reduction of the first block is the most heavily penalized one. 

It should be emphasized that the step computation is completely independent of the penalty 
parameter computation. 
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3.2.4 Step Evaluation and Trust-Region Radii Updating 

Although there are various schemes of evaluating the trial step and updating the trust region radii, 
for the sake of simplicity in discussion, we adopt the following strategy: 

• The total trial step is evaluated outside the inner loop. 

• All individual trust region radii are equal and are updated simultaneously by the same factor. 

Other strategies for practical implementations are discussed in Alexandrov*. We would like to 
emphasize that the simultaneous expansion or contraction of the trust region radii is not a technical 
requirement. 

The algorithm for evaluating the step and updating the trust region radii follows. 

Algorithm 3.2 Step Evaluation / Trust Region Update 

Given 8 k > 0,k - 1 ,...,M (or k = 1,...,M 4- 1 for optimization), 8 max > 0 ,<5 m ; n > 0,0 < r/i < 
r]2 < 1 , € ( 0 , 1 ], a 2 > 1 , x c e U n , ared , pred , 

Compute 

if r < 7/i then (step not accepted) 
h = oii * h- 

else if r > rj 2 then (step accepted) 

8 k — min{£ maa? , rnax{<5 mtw , a 2 + 
x c — X . 

else ( step accepted) 

8 k — maX"(5 m ^, 
x c - x + . 

end if 

We note that if the step is not accepted, the trust region radii are decreased withoutany safe- 
guard. However, if the step is accepted, the next trust region radius is set to be no smaller than a 
predetermined positive value 8 min . This strategy is extremely important in the global convergence 
theory. It ensures that the trust region radius is bounded away from zero and hence that the 
penalty parameters are bounded from above. This technique was introduced in [16]. 

3.2.5 The Stopping Criteria 

We use the first order necessary conditions for problem EQC to terminate the algorithm and require 
that 


l|Ci(!/tt)|| 

< 

£ tol 

l|C2(!d)ll 

< 

*tol 


< 

£ tol 

\\P T M^f{VM)\\ 

< 

ttol 


1 Natalia Alexandrov. On implementation of multilevel algorithms for nonlinear equations and equality constrained 
optimization. In preparation. 
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hold simultaneously. 

Since 

M = o(||c t (»-,)||), 

if \\Ck{yk-i)\\ is small, \\sk\\ will be small and the inner loop iterates yk will be close to each other, 
and in the limit we can show ([1]) that at least a subsequence of the generated sequence of the 
outer loop iterates will converge to a stationary point of problem EQC. 

The tolerance parameters c< 0 / need not be the same, but for convenience, they are taken to be 
the same throughout the convergence analysis. 

The reason for requiring such a stopping criterion is theoretical and practical. The conventional 
test for the entire norm of the constraint residual being close to 0 does not differentiate between the 
individual \\Ck(yk-i)\\- It is essential for the convergence proof to determine how close to feasibility 
an iterate must be in order for the penalty parameters not to be increased. This is a measure of 
feasibility versus optimality. The conventional stopping criterion allows only the total feasibility to 
be measured and thus to determine when pm does not have to be increased. But even if pM is not 
increased, may have to be increased because of the relative sizes of the component 

block norms. The conventional criterion does not allow us to measure relative feasibility of one 
block of constraints with respect to the others. 

In practice, we do not wish to evaluate the residuals at the same point just for the sake of the 
stopping criterion. 

Other stopping criteria are possible, but the one above is the most natural one. 

3.2.6 The Statement of the Algorithm 

The formal description of the algorithm follows. 

Let the constraints be partitioned into M blocks. 

Algorithm 3.3 Multilevel Algorithm for Equality Constrained Optimization 

Given 6k > 0, fc = 1 , . . . , Af, 6 max > 0 > 0, 0 < t/i < t/ 2 < 1, c*i £ (0, 1], a 2 > 1, x c £ 5ft n . 

Outer Loop: Do until convergence: 

Vo = Re- 
compute the trial step. 

Inner Loop: Do k = 1, M 
If yk-i is not feasible then 

Compute Sk that satisfies a fraction of Cauchy decrease 
condition on ^\\Ck(yk-i) + ^Ck{yk-i) s \\i restricted to 
the intersection of the null spaces of VCj(yj-\) T s = 0 ,j= 1, . . .,& — 1, 
and H^ll < A>k\\Ck(yk-i) (satisfied automatically). 

Vk - Vk ~ l + $k> 

End if 

End Inner Loop 

Compute sm+ l to satisfy the fraction of Cauchy decrease 
condition on the subproblem: minimize </>m( 5 m) restricted to 
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the intersection of the null spaces of Jj(yj-\)s = 0,j = 1, . . . , M, 
and |H| a < 8m- 
Vm + 1 = VM + %+ 1* 

£+ = 2/M+i- 

The trial step is: s c = s\ + • • • + $m+ i- 
Update the penalty parameters 

Evaluate the step and update the trust region radius 

If the step is accepted, set x c = #+. 

End Outer Loop 

We should note that there is an option to eliminate only a subset of constraints via the described 
procedure. In this case, the rest of the constraints and the objective function would be restricted to 
the intersection of the null spaces of the Jacobians of the processed constraints, and the resulting 
reduced optimization problem would be solved by a chosen method. The discussion of this approach 
is left for later work. 

4 Global Convergence Results 

In this section we give a summary of the global convergence theory for multilevel algorithms. 

4.1 Basic Ingredients of a Global Convergence Proof 

Our proof contains the general ingredients of a global convergence analysis for a trust-region 
method. The first three are requires for a typical analysis of an unconstrained minimization algo- 
rithm. 

1. The trial step must be shown to satisfy a sufficient predicted decrease condition, usually the 
FCD condition. Our algorithm assumes that the substeps satisfy the FCD condition on the 
subproblems. It remains for us to show that the total step from x c to x+ satisfies a suitable 
decrease condition. 

2. The difference between the actual and predicted reduction must be bounded above by at least 
a constant multiple of the square of the total step norm plus multiples of higher powers of 
the step norm. This is easily shown multilevel algorithms. 

3. The algorithm must be shown to be well-defined, i.e., we must prove that the ratio of the 
actual reduction to predicted reduction can be made greater than a given r/i e (0,1) after a 
finite number of trial step computations. Given 2, it is easy to show that as the trust region 
radius approaches zero, the ratio of the actual reduction to predicted reduction approaches 
one. For the algorithm to be well-defined we must show that the ratio of the predicted to 
actual reduction approaches one faster than the trust region radius goes to zero. This is easily 
established for our algorithm. 

An algorithm for constrained optimization that uses penalty parameters in its merit function 
requires the fourth ingredient. 
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4. The penalty parameter in the merit function must be shown to be bounded. The technique is 
to prove that the product of the penalty parameter and the trust region radius is bounded by 
a constant independent of the iterates. The sequence of the trust region radii is then shown 
to be bounded away from zero. Here a crucial role is played by the trust region updating 
technique introduced in [16]: after a successful iteration and before starting the next iteration, 
the trust region radius is set to be no smaller than a pre-defined value. This way of updating 
allows us to prove that the sequence of penalty parameters is bounded from above. 

The method for updating the penalty parameters ensures that the sequence of penalty param- 
eters is nondecreasing §, which, together with its boundedness, allows us to conclude that the 
penalty parameter sequence converges and, moreover, remains constant after a finite number 
of increases. This fact is used in the global convergence theorem. 

4.2 Assumptions 

We make the following assumptions on the problem and the sequence of steps and iterates: 

• /, C are at least twice continuously differentiable. 

• The gradient of the constraints has full rank. This is a strong assumption, but it is a standard 
practice to require it for the sake of convergence proofs. Practical experience suggests that 
the breakdown of this assumption does not necessarily diminish the efficacy of our algorithm. 
Not assuming full rank would allows us to prove a slightly weaker convergence result. 

. f(x), Vf(x), V 2 f(x), H m , C(x), VC(t), VC k (x), V 2 C i (x),j=l,...,m, 

{[P^i VC'jt(x)] T [Pj_ 1 VCfc(a:)]} _1 ,fc = 1 ,...,M, are a 11 uniformly bounded in normfor all x 
in the domain of interest. 

Since we require that the Hessian of the objective function be only bounded, we can even take 
it to be 0. Of course, such an approximation would lower the effectiveness of the algorithm. 

4.3 Summary of the Proof 

In this subsection we provide an overview of steps in the convergence proof. The details can be 
found in [1] and Alexandrov and Dennis 

• We show that under our assumptions, the norm of any intermediate sum of the substeps is 
bounded by a costant times the norm of the total trial step. 

• Several technical results provide workable expressions of the FCD (fraction of Cauchy de- 
crease) condition similar to the one used for unconstrained optimization. 

• A standard result provides and upper bound on the error between actual reduction and 
predicted reduction. 

§ The global convergence theory for algorithms with nonmonotone penalty parameters has been investigated by 
Mahmoud El-Alem, [12]. 

^Natalia Alexandrov and J. E. Dennis, Jr. A class of general trust-region multilevel algorithms for systems of 
nonlinear equations: Global convergence theory. In preparation. 
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• By virtue of the penalty parameter updating scheme, the multilevel algorithms have the 
property that if an iterate is feasible, the penalty parameters are not increased. We show 
that if the iterates are sufficiently close to feasibility, the penalty parameters are not increased 
either. This result is crucial to the proof of convergence, giving a sufficient condition for the 
penalty parameters not to be increased. 

• Next we establish an upper bound on the product of the penalty parameters with the trust 
region radii. This result allows us to conclude that the radii are bounded below if the penalty 
parameters increase. The penalty parameter sequences are shown to be nondecreasing, which, 
together with their boundedness from above, allows us to conclude that the penalty param- 
eters tend to a limit, and, moreover, stay constant after a finite number of outer iterations. 
The limit is shown to exist, but its explicit expression is not known. 

• We have shown that the total trust region radius is bounded away from zero if any of the 
penalty parameters are increased. Now we show that radius is always bounded away from 
zero. The trust region updating strategy ensures that is is bounded from above. 

• The next result guarantees that the algorithm is well defined, i.e., that after a finite number 
of outer loop iterations an acceptable step s c with 

ared 

1 - r h 

prea 

will be found. 

• In the global convergence result, we show that if the objective function is bounded below, then 
the sequence of iterates generated by a multilevel algorithm has a subsequence convergent to 
a stationary point of the equality constrained minimization problem. 

• As a corollary, we can now conclude that the multilevel algorithm for nonlinear equations is 
also globally convergent. 


5 Discussion and Concluding Remarks 

We have described a broad new class of multilevel algorithms for solving the nonlinear equations 
problem and the equality constrained optimization problem. The class can be considered as a 
globalization and an extension of the local class of algorithms of Brown and Brent for solving 
nonlinear systems of equations. 

The main practical appeal of the multilevel algorithms is that in the case of equality constrained 
optimization, they allow the user to partition the constraint system arbitrarily, to fit the application, 
and to process the blocks of constraints separately. In their finite- difference derivative form, they 
require fewer function evaluations than the Newton’s method. 

The multilevel class is characterized by requiring very mild conditions to be imposed on the 
trial steps. All reasonable algorithms satisfy these conditions automatically. 

We have established global convergence theory for the entire class. The theory implies conver- 
gence of the nonlinear equations solver, which, to the author’s knowledge, is the first theoretically 
supported method for globalizing Brown-Brent methods. The global convergence theory was made 


16 



possible by the introduction of the new merit function that takes into account the order of the 
constraint processing. The nested penalty parameters are updated by an extension of the scheme 
proposed by El-Alem [10]. 

The algorithms are expected to be applicable to the problem of the multidisciplinary design 
optimization and to serve as a foundation for the study of the general multilevel optimization 
problem. 

We would like to mention one more application. The design of complex engineering systems is 
by nature a multicriteria optimization problem. The design projects are distinguished by very large 
numbers of variables, constraints, and expensive analyses. To solve the problem, it is necessary 
to break it into disciplines, each of which produces its own optimal design. The discipline designs 
are then incorporated into a total design. The multilevel methods proposed here would allow 
researchers to integrate constraints obtained from different sources. 

To solve the multicriteria optimization problem, it is necessary to decide when an iterate is 
optimal. One of the approaches to optimality is the statement of the multicriteria problem as a 
multilevel optimization problem, i.e., the problem of minimizing a function on a feasible set, which 
is an optimal set for another function, and so on. In such an approach, the user places priorities 
on the optimization problems that are to be solved sequentially. We believe that the multilevel 
algorithms proposed here will serve as a beginning for a detailed study of the general multilevel 
optimization problem. 

Directions of research in progress include local convergence rates, implementation, extensive 
testing on applications, incorporation of bound and inequality constraints, and extensions to general 
nonlinear bilevel and multilevel optimization. 
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